Skip to content

Conversation

@abhijat
Copy link
Contributor

@abhijat abhijat commented Nov 16, 2025

A stateless allocator is used to parse JSON from string. The allocator forwards requests to a thread local memory resource pointer which is backed by the mimalloc heap. The memory resource is initialized during engine shard init, same as the CompactObject thread local initialization.

There are two new restrictions that are now in effect when using this stateless allocator:

  • Objects created on one shard should not be destroyed on another shard, and as a result should be moved/copied to other shard. This is because each shard has its own memory resource, and it is possible that the destination shard does not have a mimalloc heap initialized, as is the case with coordinator thread in our test suite.
  • Objects which use the std allocator based parser introduced in server: Parse JSON on destination shard #6061 are not interchangeable with objects allocated on mimalloc heap, their types are different and their allocators are different. If conversion between two types is required, the objects have to be serialized from source type into a string, and then deserialized into the destination type.

@abhijat abhijat force-pushed the abhijat/fix/use-stateless-alloc-for-json branch 4 times, most recently from 76d8385 to 6e2af7f Compare November 16, 2025 08:27
@abhijat
Copy link
Contributor Author

abhijat commented Nov 16, 2025

This is removed and now all parse operations use mimalloc.

A problem with this approach is that not all cores are guaranteed to have called EngineShard::InitThreadLocal and have a memory resource using the mimalloc heap initialized.

This is seen in the failing tests. If the num_shards flag is set to less than the number of cores, some cores will not have a thread local heap for parsing JSON.

There are places in code which just need to parse some string into json without relying on mimalloc heap (eg in cluster config). These places use the default memory resource. They need to use the same allocator which uses the mimalloc heap.

@abhijat
Copy link
Contributor Author

abhijat commented Nov 16, 2025

There are places in code which just need to parse some string into json without relying on mimalloc heap (eg in cluster config). These places use the default memory resource. They need to use the same allocator which uses the mimalloc heap.

The right thing to do here seems to be to defer all parsing of JSON objects until we are on the shard, in the tx callback. At present some parsing is done on coordinator thread to return errors early, and some on the shard where the key resides.

The other uses of JsonFromString such as cluster config module can use the default pmr resource. As they do not interact with compact obj. and related heap.

@abhijat
Copy link
Contributor Author

abhijat commented Nov 16, 2025

will be rebased after #6061, which will introduce two paths for parsing JSON values:

  • parse on any shard using default PMR resource, not intended to use as compact object OR
  • parse on transaction destination shard using mimalloc heap, intended to use as compact object in db

@romange
Copy link
Collaborator

romange commented Nov 16, 2025

@abhijat I am not sure I get it. Why can not we use default resource in coordinator thread together with JSONFromString, and use ShardJsonFromString in shard threads together with thread local heap?

@romange
Copy link
Collaborator

romange commented Nov 16, 2025

Ok, I understand now:

it is possible to work around this by using two allocator types, one for the default heap and one for mimalloc heap. but since the allocator type is part of the json object type, going down this path means that we now have two non interchangeable json object types

but where do they meet? i.e. do we pass a "coordinator" json objects to shard threads? in other words, even today I would expect that those objects would be "non-interchangeable", as we must allocate shard local objects only with shard local heap, so in a way having two disjoin c++ types would make things more explicit and clear.

@abhijat
Copy link
Contributor Author

abhijat commented Nov 17, 2025

but where do they meet? i.e. do we pass a "coordinator" json objects to shard threads? in other words, even today I would expect that those objects would be "non-interchangeable", as we must allocate shard local objects only with shard local heap,

There are a few places where we allocate on coordinator using the default memory resource and pass to the destination shard which uses mimalloc based allocation, and then mix those objects. I think it is done to facilitate early error reporting, but I have changed those call sites in #6061 deferring json parsing to the destination shard.

In fact I found this comment https://github.com/dragonflydb/dragonfly/blob/main/src/server/json_family.cc#L1905-L1907 which mentions this same problem.

so in a way having two disjoin c++ types would make things more explicit and clear.

Yes, I added a new type in that PR called ShortLivedJSON (name is a bit confusing but not sure what else to name it) - which uses the standard allocator, in fact it is an alias for the default json type of jsoncons. It is now used in most places where we do json parsing with std::pmr::get_default_resource()

After that parent PR is merged, the other JSONType which we use everywhere in the code can be changed to use stateless allocator, and the two types will be incompatible and never expected to work with each other.

@romange
Copy link
Collaborator

romange commented Nov 17, 2025

Great!

@abhijat abhijat force-pushed the abhijat/fix/use-stateless-alloc-for-json branch 6 times, most recently from 3fa1c19 to 707e6a7 Compare November 17, 2025 15:56
Comment on lines +1535 to +1537
string s;
val.dump(s);
return JsonFromString(s);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunate side effect of not being able to send JsonType object to the coordinator thread, it can only be destroyed on the shard that created it. We have to copy it into a safe type before sending it to coordinator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: mimalloc will deallocate the object from another thread, but the coordinator thread might not have mimalloc heap set up on it.

@abhijat abhijat force-pushed the abhijat/fix/use-stateless-alloc-for-json branch from 707e6a7 to 7de47b0 Compare November 17, 2025 16:24
@abhijat

This comment was marked as resolved.

@abhijat abhijat force-pushed the abhijat/fix/use-stateless-alloc-for-json branch 3 times, most recently from 575e634 to 2f44fe8 Compare November 20, 2025 10:35
A stateless allocator is added for JSON parsing. This allocator forwards
all requests to a thread local pointer to a memory resource, and assumes
that the resource has been initialized when the allocator is
constructed.

Signed-off-by: Abhijat Malviya <[email protected]>
The new stateless allocator is used for JSON type. The backing memory
resource is initialized in engine shard.

The function which parses JSON is refactored to no longer accept a
memory resource, as the allocator is bound to its thread local memory
resource.

Signed-off-by: Abhijat Malviya <[email protected]>
Some unit tests do custom memory resource set up and parse JSON, these
are fixed so that the thread local resource used for parsing is
available before test starts.

Signed-off-by: Abhijat Malviya <[email protected]>
The json expression maker function is templatized so it can be used with
both short lived and storable JSON values.

The short lived variant is used within search family validation.

Signed-off-by: Abhijat Malviya <[email protected]>
@abhijat abhijat force-pushed the abhijat/fix/use-stateless-alloc-for-json branch from 2f44fe8 to ed58a32 Compare November 21, 2025 06:58
@abhijat
Copy link
Contributor Author

abhijat commented Nov 21, 2025

I think this is in good condition now once tests pass. I would like to rename JsonType to something like PrimeValueJson or PvJson to make it clear that it is intended for use as prime value but that can be a separate PR as it will touch a lot of files.

@abhijat abhijat marked this pull request as ready for review November 21, 2025 10:30
@abhijat abhijat requested review from kostasrim and romange November 21, 2025 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants